P-values:

Altman, Douglas G., and J. Martin Bland. 1995. “Statistics Notes: Absence of Evidence Is Not Evidence of Absence.” BMJ 311 (7003): 485.

Baker, Monya. 2016. “Statisticians Issue Warning over Misuse of P Values.” Nature 531 (7593): 151–51.

Javier Mtz.-Rdz.

What is a p-value?

  • Ronald Fisher introduced the p-value in the 1920s.
  • He originally intended it to be an informal method of determining whether a particular observation was worth of a second look.
  • Idea: see if the results were consistent with what random chance might produce.

“An Image of Ronald Fisher in 1913” (1913)

What is a p-value?

  • Idea: see if the results were consistent with what random chance might produce.

    1. Set up a ‘null hypothesis’ (\(H_0\)) that they wanted to disprove.
    2. Assuming that this null hypothesis was in fact true.
    3. Calculate the chances of getting results at least as extreme as what was actually observed.

The p-value tells us the probability of observing our data, assuming that the null hypothesis is true.

What is a p-value?

Suppose we have a sample of size \(n\).

What is a p-value?

Based on that, we create our null model.

What is a p-value?

Null model | Significance level

What is a p-value?

Null model | Significance level | P-value

What is a p-value?

Null model | Significance level | P-value

What is a p-value?

When we increase the \(n\), the null model distribution becomes narrower.

What is a p-value?

The Fallacy of Statistical Significance

  • The p-value was never meant to be used the way it is used today (Nuzzo 2014).

  • P-values alone cannot determine the truth or importance of research findings (Baker 2016).

    • Just by chance, many conclusions will be wrong.

The Fallacy of Statistical Significance

The Fallacy of Statistical Significance

The Fallacy of Statistical Significance

  • The p-value was never meant to be used the way it is used today (Nuzzo 2014).

  • P-values alone cannot determine the truth or importance of research findings (Baker 2016).

    • Just by chance, many conclusions will be wrong.
    • P-hacking (also data dredging or data snooping). It is trying multiple things until you get the desired p-value.

The Fallacy of Statistical Significance

The Fallacy of Statistical Significance

  • The p-value was never meant to be used the way it is used today (Nuzzo 2014).

  • P-values alone cannot determine the truth or importance of research findings (Baker 2016).

    • Just by chance, many conclusions will be wrong.
    • P-hacking. It is trying multiple things until you get the desired p-value.
  • Tendency to deflect attention from the actual size of an effect (Nuzzo 2014).

The Fallacy of Statistical Significance

Cacioppo et al. (2013)

On a 1-to-7 scale at 5.6, versus 5.5 for those who met offline.

Nuzzo (2013)

Reporting non-significant findings

  • “Absence of evidence is not evidence of absence” (Altman and Bland 1995).
    • The lack of statistical significance of most of the individual trials led to a long delay before the true value is found.
    • The sample size of controlled trials is generally inadequate, with a consequent lack of power to detect real, and clinically worthwhile, differences in treatment.

Reporting non-significant findings

How big is the problem?

  • Beside the problems, it is one of the most influential metrics to determine whether a result is published in a scientific journal.

Z-values extracted from confidence intervals in Medline between 1976 and 2019

.

How big is the problem?

False positive probability = 9 / (9 + 12) = 43%

(Colquhoun 2014; Barnett 2022)

Conclusion:

Rethinking the Role of P Values

  • Avoid making scientific conclusions or policy decisions based only on P-values (Baker 2016).
  • You should weigh the evidence properly before drawing any conclusions (e.g. consider sample size).
  • P-values alone are not enough to determine the truth or importance of research findings.
  • It is crucial to ensure transparent reporting and rigorous analysis of scientific research.
  • We should question whether the absence of evidence is a valid justification for inaction (Altman and Bland 1995).






Thanks!

Moving Towards a More Rigorous Approach

Some new approaches:

References

Altman, Douglas G., and J. Martin Bland. 1995. “Statistics Notes: Absence of Evidence Is Not Evidence of Absence.” BMJ 311 (7003): 485. https://doi.org/10.1136/bmj.311.7003.485.
“An Image of Ronald Fisher in 1913.” 1913. Wikipedia. https://commons.wikimedia.org/wiki/File:Youngronaldfisher2.JPG.
Aschwanden, Christie. 2015. “Science Isn’t Broken.” FiveThirtyEight.
Baker, Monya. 2016. “Statisticians Issue Warning over Misuse of P Values.” Nature 531 (7593): 151–51. https://doi.org/10.1038/nature.2016.19503.
Barnett, Adrian. 2022. “Bad Statistics in Medical Research.”
Benjamin, Daniel J., and James O. Berger. 2019. “Three Recommendations for Improving the Use of p-Values.” The American Statistician 73 (sup1): 186–91. https://doi.org/10.1080/00031305.2018.1543135.
Blume, Jeffrey D., Robert A. Greevy, Valerie F. Welty, Jeffrey R. Smith, and William D. Dupont. 2019. “An Introduction to Second-Generation p-Values.” The American Statistician 73 (sup1): 157–67. https://doi.org/10.1080/00031305.2018.1537893.
Bohannon, John. 2015. “I Fooled Millions Into Thinking Chocolate Helps Weight Loss. Here’s How.” Gizmodo. https://gizmodo.com/i-fooled-millions-into-thinking-chocolate-helps-weight-1707251800.
Cacioppo, John T., Stephanie Cacioppo, Gian C. Gonzaga, Elizabeth L. Ogburn, and Tyler J. VanderWeele. 2013. “Marital Satisfaction and Break-Ups Differ Across on-Line and Off-Line Meeting Venues.” Proceedings of the National Academy of Sciences 110 (25): 10135–40. https://doi.org/10.1073/pnas.1222447110.
Colquhoun, David. 2014. “An Investigation of the False Discovery Rate and the Misinterpretation of p-Values.” Royal Society Open Science 1 (3): 140216. https://doi.org/10.1098/rsos.140216.
———. 2019. “The False Positive Risk: A Proposal Concerning What to Do About p-Values.” The American Statistician 73 (sup1): 192–201. https://doi.org/10.1080/00031305.2018.1529622.
Gannon, Mark Andrew, Carlos Alberto de Bragança Pereira, and Adriano Polpo. 2019. “Blending Bayesian and Classical Tools to Define Optimal Sample-Size-Dependent Significance Levels.” The American Statistician 73 (sup1): 213–22. https://doi.org/10.1080/00031305.2018.1518268.
Goodman, William M., Susan E. Spruill, and Eugene Komaroff. 2019. “A Proposed Hybrid Effect Size Plus p-Value Criterion: Empirical Evidence Supporting Its Use.” The American Statistician 73 (sup1): 168–85. https://doi.org/10.1080/00031305.2018.1564697.
Harrell, Frank. 2017. “A Litany of Problems With p-Values.” Statistical Thinking. https://www.fharrell.com/post/pval-litany/.
Leek, Jeffrey T., and Roger D. Peng. 2015. “Statistics: P Values Are Just the Tip of the Iceberg.” Nature 520 (7549): 612–12. https://doi.org/10.1038/520612a.
Matthews, Robert A. J. 2019. “Moving Towards the Post p \(<\) 0.05 Era via the Analysis of Credibility.” The American Statistician 73 (sup1): 202–12. https://doi.org/10.1080/00031305.2018.1543136.
Nuzzo, Regina. 2013. “Online Daters Do Better in the Marriage Stakes.” Nature, June. https://doi.org/10.1038/nature.2013.13120.
———. 2014. “Scientific Method: Statistical Errors.” Nature 506 (7487): 150–52. https://doi.org/10.1038/506150a.
Pogrow, Stanley. 2019. “How Effect Size (Practical Significance) Misleads Clinical Practice: The Case for Switching to Practical Benefit to Assess Applied Research Findings.” The American Statistician 73 (sup1): 223–34. https://doi.org/10.1080/00031305.2018.1549101.
“P-Values.” n.d. Xkcd. https://xkcd.com/1478/. Accessed March 15, 2024.
“Significant.” n.d. Xkcd. https://xkcd.com/882/. Accessed March 15, 2024.
Stefan, Angelika M., and Felix D. Schönbrodt. 2023. “Big Little Lies: A Compendium and Simulation of p-Hacking Strategies.” Royal Society Open Science 10 (2): 220346. https://doi.org/10.1098/rsos.220346.
Wasserstein, Ronald L., and Nicole A. Lazar. 2016. “The ASA Statement on p-Values: Context, Process, and Purpose.” The American Statistician 70 (2): 129–33. https://doi.org/10.1080/00031305.2016.1154108.

Rethinking the Role of P Values

A little of context:

  • [Early 1900’s ] Neyman vs. Fisher.
  • [1995] Altman and Bland (1995)’s critic to the evaluation of non-significant findings.
  • [2014] Nuzzo (2014)’s critic to the use of p-values and quantitative methodology.
  • [2015] Leek and Peng (2015)’s critic to the use of p-values and quantitative methodology.
  • [2015] Ban of p-values by Basic and Applied Social Psychology (BASP).
  • [2016] The ASA’s Statement on p-Values by Wasserstein and Lazar (2016).